3 research outputs found
Visible-Infrared Person Re-Identification Using Privileged Intermediate Information
Visible-infrared person re-identification (ReID) aims to recognize a same
person of interest across a network of RGB and IR cameras. Some deep learning
(DL) models have directly incorporated both modalities to discriminate persons
in a joint representation space. However, this cross-modal ReID problem remains
challenging due to the large domain shift in data distributions between RGB and
IR modalities. % This paper introduces a novel approach for a creating
intermediate virtual domain that acts as bridges between the two main domains
(i.e., RGB and IR modalities) during training. This intermediate domain is
considered as privileged information (PI) that is unavailable at test time, and
allows formulating this cross-modal matching task as a problem in learning
under privileged information (LUPI). We devised a new method to generate images
between visible and infrared domains that provide additional information to
train a deep ReID model through an intermediate domain adaptation. In
particular, by employing color-free and multi-step triplet loss objectives
during training, our method provides common feature representation spaces that
are robust to large visible-infrared domain shifts. % Experimental results on
challenging visible-infrared ReID datasets indicate that our proposed approach
consistently improves matching accuracy, without any computational overhead at
test time. The code is available at:
\href{https://github.com/alehdaghi/Cross-Modal-Re-ID-via-LUPI}{https://github.com/alehdaghi/Cross-Modal-Re-ID-via-LUPI
Multimodal Data Augmentation for Visual-Infrared Person ReID with Corrupted Data
The re-identification (ReID) of individuals over a complex network of cameras
is a challenging task, especially under real-world surveillance conditions.
Several deep learning models have been proposed for visible-infrared (V-I)
person ReID to recognize individuals from images captured using RGB and IR
cameras. However, performance may decline considerably if RGB and IR images
captured at test time are corrupted (e.g., noise, blur, and weather
conditions). Although various data augmentation (DA) methods have been explored
to improve the generalization capacity, these are not adapted for V-I person
ReID. In this paper, a specialized DA strategy is proposed to address this
multimodal setting. Given both the V and I modalities, this strategy allows to
diminish the impact of corruption on the accuracy of deep person ReID models.
Corruption may be modality-specific, and an additional modality often provides
complementary information. Our multimodal DA strategy is designed specifically
to encourage modality collaboration and reinforce generalization capability.
For instance, punctual masking of modalities forces the model to select the
informative modality. Local DA is also explored for advanced selection of
features within and among modalities. The impact of training baseline fusion
models for V-I person ReID using the proposed multimodal DA strategy is
assessed on corrupted versions of the SYSU-MM01, RegDB, and ThermalWORLD
datasets in terms of complexity and efficiency. Results indicate that using our
strategy provides V-I ReID models the ability to exploit both shared and
individual modality knowledge so they can outperform models trained with no or
unimodal DA. GitHub code: https://github.com/art2611/ML-MDA.Comment: 8 pages of main content, 2 pages of references, 2 pages of
supplementary material, 3 figures, WACV 2023 RWS workshop
Adaptive Generation of Privileged Intermediate Information for Visible-Infrared Person Re-Identification
Visible-infrared person re-identification seeks to retrieve images of the
same individual captured over a distributed network of RGB and IR sensors.
Several V-I ReID approaches directly integrate both V and I modalities to
discriminate persons within a shared representation space. However, given the
significant gap in data distributions between V and I modalities, cross-modal
V-I ReID remains challenging. Some recent approaches improve generalization by
leveraging intermediate spaces that can bridge V and I modalities, yet
effective methods are required to select or generate data for such informative
domains. In this paper, the Adaptive Generation of Privileged Intermediate
Information training approach is introduced to adapt and generate a virtual
domain that bridges discriminant information between the V and I modalities.
The key motivation behind AGPI^2 is to enhance the training of a deep V-I ReID
backbone by generating privileged images that provide additional information.
These privileged images capture shared discriminative features that are not
easily accessible within the original V or I modalities alone. Towards this
goal, a non-linear generative module is trained with an adversarial objective,
translating V images into intermediate spaces with a smaller domain shift
w.r.t. the I domain. Meanwhile, the embedding module within AGPI^2 aims to
produce similar features for both V and generated images, encouraging the
extraction of features that are common to all modalities. In addition to these
contributions, AGPI^2 employs adversarial objectives for adapting the
intermediate images, which play a crucial role in creating a
non-modality-specific space to address the large domain shifts between V and I
domains. Experimental results conducted on challenging V-I ReID datasets
indicate that AGPI^2 increases matching accuracy without extra computational
resources during inference